NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Guidelines for gene and genome assembly nomenclature

https://doi.org/10.1093/genetics/iyaf006

Cannon, Ethalinda KS; Molik, David C; Wright, Adam J; Zhang, Huiting; Honaas, Loren; Chougule, Kapeel; Dyer, Sarah (January 2025, GENETICS)
Harris, T (Ed.)
Abstract The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources. To address this, we developed a specification for standardizing the naming of reference genome assemblies, to improve consistency across datasets and facilitate interoperability. This specification was created with FAIR (Findable, Accessible, Interoperable, and Reusable) practices in mind, ensuring that reference assemblies are easier to locate, access, and reuse across research communities. Additionally, it has been designed to comply with primary genomic data repositories, including members of the International Nucleotide Sequence Database Collaboration consortium, ensuring compatibility with widely used databases. While initially tailored to the agricultural genomics community, the specification is adaptable for use across different taxa. Widespread adoption of this standardized nomenclature would streamline assembly management, better enable cross-species analyses, and improve the reproducibility of research. It would also enhance natural language processing applications that depend on consistent reference assembly names in genomic literature, promoting greater integration and automated analysis of genomic data. This is a good time to consider more consistent genomic data nomenclature as many research communities and data resources are now finding themselves juggling multiple datasets from multiple data providers.
more » « less
Full Text Available
GrameneOryza: a comprehensive resource for Oryza genomes, genetic variation, and functional data

https://doi.org/10.1093/database/baaf021

Wei, Sharon; Chougule, Kapeel; Olson, Andrew; Lu, Zhenyuan; Tello-Ruiz, Marcela K; Kumar, Vivek; Kumari, Sunita; Zhang, Lifang; Olson, Audra; Kim, Catherine; et al (January 2025, Database)

Abstract Rice is a vital staple crop, sustaining over half of the global population, and is a key model for genetic research. To support the growing need for comprehensive and accessible rice genomic data, GrameneOryza (https://oryza.gramene.org) was developed as an online resource adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) principles of data management. It distinguishes itself through its comprehensive multispecies focus, encompassing a wide variety of Oryza genomes and related species, and its integration with FAIR principles to ensure data accessibility and usability. It offers a community curated selection of high-quality Oryza genomes, genetic variation, gene function, and trait data. The latest release, version 8, includes 28 Oryza genomes, covering wild rice and domesticated cultivars. These genomes, along with Leersia perrieri and seven additional outgroup species, form the basis for 38 K protein-coding gene family trees, essential for identifying orthologs, paralogs, and developing pan-gene sets. GrameneOryza’s genetic variation data features 66 million single-nucleotide variants (SNVs) anchored to the Os-Nipponbare-Reference-IRGSP-1.0 genome, derived from various studies, including the Rice Genome 3 K (RG3K) project. The RG3K sequence reads were also mapped to seven additional platinum-quality Asian rice genomes, resulting in 19 million SNVs for each genome, significantly expanding the coverage of genetic variation beyond the Nipponbare reference. Of the 66 million SNVs on IRGSP-1.0, 27 million acquired standardized reference SNP cluster identifiers (rsIDs) from the European Variation Archive release v5. Additionally, 1200 distinct phenotypes provide a comprehensive overview of quantitative trait loci (QTL) features. The newly introduced Oryza CLIMtools portal offers insights into environmental impacts on genome adaptation. The platform’s integrated search interface, along with a BLAST server and curation tools, facilitates user access to genomic, phylogenetic, gene function, and QTL data, supporting broad research applications. Database URL: https://oryza.gramene.org
more » « less
Full Text Available
MaizeCODE reveals bi-directionally expressed enhancers that harbor molecular signatures of maize domestication

https://doi.org/10.1038/s41467-024-55195-w

Cahn, Jonathan; Regulski, Michael; Lynn, Jason; Ernst, Evan; de_Santis_Alves, Cristiane; Ramakrishnan, Srividya; Chougule, Kapeel; Wei, Sharon; Lu, Zhenyuan; Xu, Xiaosa; et al (December 2024, Nature Communications)

Abstract Modern maize (Zea maysssp.mays) was domesticated fromTeosinte parviglumis(Zea maysssp.parviglumis), with subsequent introgressions fromTeosinte mexicana(Zea maysssp.mexicana), yielding increased kernel row number, loss of the hard fruit case and dissociation from the cob upon maturity, as well as fewer tillers. Molecular approaches have identified transcription factors controlling these traits, yet revealed that a complex regulatory network is at play. MaizeCODE deploys ENCODE strategies to catalog regulatory regions in the maize genome, generating histone modification and transcription factor ChIP-seq in parallel with transcriptomics datasets in 5 tissues of 3 inbred lines which span the phenotypic diversity of maize, as well as the teosinte inbred TIL11. Transcriptomic analysis reveals that pollen grains share features with endosperm, and express dozens of “proto-miRNAs” potential vestiges of gene drive and hybrid incompatibility. Integrated analysis with chromatin modifications results in the identification of a comprehensive set of regulatory regions in each tissue of each inbred, and notably of distal enhancers expressing non-coding enhancer RNAs bi-directionally, reminiscent of “super enhancers” in animal genomes. Furthermore, the morphological traits selected during domestication are recapitulated, both in gene expression and within regulatory regions containing enhancer RNAs, while highlighting the conflict between enhancer activity and silencing of the neighboring transposable elements.
more » « less
Gapless assembly of maize chromosomes using long-read technologies

https://doi.org/10.1186/s13059-020-02029-9

Liu, Jianing; Seetharam, Arun S.; Chougule, Kapeel; Ou, Shujun; Swentowsky, Kyle W.; Gent, Jonathan I.; Llaca, Victor; Woodhouse, Margaret R.; Manchanda, Nancy; Presting, Gernot G.; et al (December 2020, Genome Biology)
null (Ed.)
Full Text Available
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

https://doi.org/10.1186/s13059-019-1905-y

Ou, Shujun; Su, Weija; Liao, Yi; Chougule, Kapeel; Agda, Jireh R.; Hellinga, Adam J.; Lugo, Carlos Santiago; Elliott, Tyler A.; Ware, Doreen; Peterson, Thomas; et al (December 2019, Genome Biology)

Full Text Available
Effect of sequence depth and length in long-read assembly of the maize inbred NC358

https://doi.org/10.1038/s41467-020-16037-7

Ou, Shujun; Liu, Jianing; Chougule, Kapeel M.; Fungtammasan, Arkarachai; Seetharam, Arun S.; Stein, Joshua C.; Llaca, Victor; Manchanda, Nancy; Gilbert, Amanda M.; Wei, Sharon; et al (May 2020, Nature Communications)

Abstract Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11–21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.
more » « less
Improved RNA-seq Workflows Using CyVerse Cyberinfrastructure

https://doi.org/10.1002/cpbi.53

Chougule, Kapeel M.; Wang, Liya; Stein, Joshua C.; Wang, Xiaofei; Devisetty, Upendra Kumar; Klein, Robert R.; Ware, Doreen (September 2018, Current Protocols in Bioinformatics)
De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes

https://doi.org/10.1126/science.abg5289

Hufford, Matthew B.; Seetharam, Arun S.; Woodhouse, Margaret R.; Chougule, Kapeel M.; Ou, Shujun; Liu, Jianing; Ricci, William A.; Guo, Tingting; Olson, Andrew; Qiu, Yinjie; et al (August 2021, Science)

We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
more » « less

Search for: All records